Multi-Criteria Approaches to Markov Decision Processes with Uncertain Transition Parameters
نویسندگان
چکیده
Markov decision processes (MDPs) are a well established model for planing under uncertainty. In most situations the MDP parameters are estimates from real observations such that their values are not known precisely. Different types of MDPs with uncertain, imprecise or bounded transition rates or probabilities and rewards exist in the literature. Commonly the resulting processes are optimized with respect to the most robust policy which means that the goal is to generate a policy with the best worst case behavior. However, implementing such a policy could lead to potential losses of reward in most situations. In general, one is interested in policies which behave well in all situations which results in a multi-objective view of decision making. In this paper we consider policies for the discounted reward of MDPs with uncertain parameters. In particular, the approach is defined for bounded-parameter Markov decision processes (BMDPs) [GLD00]. In this setting the worst, best and average case performance of a policy is analyzed. The paper presents some theoretical results and algorithms to compute or approximate the convex hull of pure Pareto optimal policies in the value vector space.
منابع مشابه
Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters
Markov decision processes (MDPs) are a popular model for performance analysis and optimization of stochastic systems. The parameters of stochastic behavior of MDPs are estimates from empirical observations of a system; their values are not known precisely. Different types of MDPs with uncertain, imprecise or bounded transition rates or probabilities and rewards exist in the literature. Commonly...
متن کاملUtilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs
Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...
متن کاملDeveloping a model for simulating urban expansion based on the concept of decision risk: A case study in Babol city
Today, the study of the spatial-temporal pattern of urban physical expansion and the identification of the parameters affecting the expansion play a crucial role in urban-related decision-making and long-term planning processes. Consequently, the use of precise and efficient methods to predict the physical expansion of urban areas is of great importance. The objective of present study is to pro...
متن کاملSensitivity Analysis in Markov Decision Processes with Uncertain Reward Parameters
Sequential decision problems can often be modeled as Markov decision processes. Classical solution approaches assume that the parameters of the model are known. However, model parameters are usually estimated and uncertain in practice. As a result, managers are often interested in how estimation errors affect the optimal solution. In this paper we illustrate how sensitivity analysis can be perf...
متن کاملLoss Bounds for Uncertain Transition Probabilities in Markov Decision Processes
We analyze losses resulting from uncertain transition probabilities in Markov decision processes with bounded nonnegative rewards. We assume that policies are pre-computed using exact dynamic programming with the estimated transition probabilities, but the system evolves according to different, true transition probabilities. Our approach analyzes the growth of errors incurred by stepping backwa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015